Efficient Data-Sensitive Techniques for Parallel Retrieval of Keyword-indexed Information

نویسنده

  • Rakesh M. Verma
چکیده

Keyword based search of data, such as documents, maps, images, audio and video data, is an everyday activity for many millions of people with myriad uses, e.g., scientific computing, digital libraries, the web, catalogs, geographical information systems, music servers, etc. In this paper, we present several declustering algorithms based on existing similarity measures as well as their generalizations. Experiments show that the new declustering methods are indeed more efficient in declustering times and close in terms of parallel query times than quadratic declustering methods based on existing similarity measures. Our declustering algorithms are sublinear in the number of comparisons and scalable with increasing data and disks. The new methods are also capable of handling streaming data quite efficiently. We present some negative results on random and profile-based sampling. Although the new sampling strategies based on profiles of the documents outperform the old sampling strategies, which do not use a profile, they are still worse than random and round-robin. Further testing is required to confirm the sampling results. 1 Research supported in part by NSF grant CCF 0306475.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Effective Path-aware Approach for Keyword Search over Data Graphs

Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...

متن کامل

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

Fuzzy retrieval of encrypted data by multi-purpose data-structures

The growing amount of information that has arisen from emerging technologies has caused organizations to face challenges in maintaining and managing their information. Expanding hardware, human resources, outsourcing data management, and maintenance an external organization in the form of cloud storage services, are two common approaches to overcome these challenges; The first approach costs of...

متن کامل

Power-aware data retrieval protocols for indexed broadcast parallel channels

In pervasive and mobile computing environments, “timely and reliable” access to public data requires methods that allow quick, efficient, and low-power access to information to overcome technological limitations of wireless communication and access devices. Literature suggests broadcasting (one-way communication) as an effective way to disseminate the public data to mobile devices. Within the s...

متن کامل

Improving the Efficiency of Data Retrieval in Secure Cloud by Introducing Conjunction of Keywords

Cloud computing uses internet and central remote servers to maintain data and applications. This allows much more efficient computing by centralizing storage, memory, procession and bandwidth. The data is stored in off-premises and accessing this data through keyword search. Traditional keyword search was based on plaintext keyword search. But for protecting data privacy the sensitive data shou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010